In [ ]:
from __future__ import print_function, division
Python is a general purpose programming language. It is used extensively for scientific computing, data analytics and visualization, web development and software development. It has a wide user base and excellent library support.
There are many ways to use and interact with the Python language. The first way is to access it directly from the command prompt and calling python <script>.py. This runs a script written in Python and does whatever you have programmed the computer to do. But scripts have to be written and how do we actually write Python scripts?
Actually Python scripts are just .txt files. So you could just open a .txt file and write a script, saving the file with a .py extension. The downsides of this approach is obvious to anyone working with Windows. Usually, Python source code is written-not with Microsoft Word- but with and Integrated Development Environment. An IDE combines a text editor with a running Python console to test code and actually do work with Python without switching from one program to another. If you learnt the C, or C++ language, you will be familiar with Vim. Other popular IDE's for Python are Pycharm, Spyder and the Jupyter Notebook.
In this course, we will use the Jupyter Notebook as our IDE because of its ease of use ability to execute code cell by cell. It integrates with markdown so that one can annotate and document your code on the fly! All in all, it is an excellent tool for teaching and learning Python before one migrates to more advanced tools like Spyder for serious scripting and development work.
In order to get the most from Python, your best source of reference is the Python documentation. Getting good at Python is a matter using it regularly and familiarizing yourself with the keywords, constructs and commonly used idioms.
Learn to use the Shift-Tab when coding. This activates a hovering tooltip that provides documentation for keywords, functions and even variables that you have declared in your environment. This convenient tooltip and be expanded into a pop-up window on your browser for easy reference. Use this often to reference function signatures, documentation and general help.
Jupyter notebook comes with Tab completion. This quality of life assists you in typing code by listing possible autocompletion options so that you don't have to type everything out! Use Tab completion as often as you can. This makes coding faster and less tedious. Tab completion also allows you to check out various methods on classes which comes in handy when learning a library for the first time (like matplotlib or seaborn).
Finally ask Google. Once you have acquired enough "vocabulary", you can begin to query Google with your problem. And more often that not, somehow has experienced the same conundrum and left a message on Stackexchange. Browsing the solutions listed there is a powerful way to learn programming skills.
The learning objectives of this first unit are:
print("Hello world!") int, str, float and bool. type function. list object and list methods. list. Slicing and indexing. All code is written in cells. Cells are where code blocks go. You execute a cell by pressing Shift-Enter or pressing the "play" button. Or you could just click on the drop down menu and select "Run cell" but who would want to do that!
In general, cells have two uses: One for writing "live" Python code which can be executed and one more to write documentation using markdown. To toggle between the two cell types, press Escape to exit from "edit" mode. The edges of the cell should turn blue. Now you are in "command" mode. Escape actually activates "command" mode. Enter activates "edit" mode. With the cell border coloured blue, press M to enter into markdown mode. You should see the In [ ]: prompt dissappear. Press Enter to change the border to green. This means you can now "edit" markdown. How does one change from markdown to a live coding cell? In "command" mode (remember blue border) press Y. Now the cell is "hot". When you Shift-Enter, you will execute code. If you happen to write markdown when in a "coding" cell, the Python kernel will shout at you. (Means raise an error message)
Now its time for you to try. In the cell below, try switching to Markdown. Press Enter to activate "edit" mode and type some text in the cell. Press Shift-Enter and you should see the output rendered in html. Note that this is not coding yet
In [ ]:
# change this cell into a Markdown cell. Then type something here and execute it (Shift-Enter)
In [ ]:
'''Make sure you are in "edit" mode and that this cell is for Coding ( You should see the In [ ]:)
on the left of the cell. '''
print("Hello world!")
Notice that Hello world! is printed at the bottom of the cell as an output. In general, this is how output of a python code is displayed to you.
print is a special function in Python. It's purpose is to display output to the console. Notice that we pass an argument-in this case a string "Hello world!"- to the function. All arguments passed to the function must be enclosed in round brackets and this signals to the Python interpreter to execute a function named print with the argument "Hello world!".
Your next exercise is to print your own name to the console. Remember to enclose your name in " " or ' '
In [ ]:
# print your name in this cell.
Commenting is a way to annotate and document code. There are two ways to do this: Inline using the # character or by using ''' <documentation block> ''', the latter being multi-line and hence used mainly for documenting functions or classes. Comments enclosed using ''' '''' style commenting are actually registed in Jupyter notebook and can be accessed from the Shift-Tab tooltip!
One should use # style commenting very sparingly. By right, code should be clear enough that # inline comments are not needed.
However, # has a very important function. It is used for debugging and trouble-shooting. This is because commented code sections are never executed when you execute a cell (Shift-Enter)
Python is an Object Oriented Programming language. That means to all of python is made out of objects which are instances of classes. The main point here is that I am going to introduce 4 basic objects of Python which form the backbone of any program or script.
int. str. You've met one of these: "Hello world!". For those who know about character encoding, it is highly encouraged to code Python with UTF-8 encoding. float. Basically the computer version of real numbers. bool. In Python, true and false are indicated by the reserved keywords True and False. Take note of the capitalized first letter.You can't call yourself a scientific computing language without the ability to deal with numbers. The basic arithmetic operations for numbers are exactly as you expect it to be
In [ ]:
# Addition
5+3
In [ ]:
# Subtraction
8-9
In [ ]:
# Multiplication
3*12
In [ ]:
# Division
48/12
Note the floating point answer. In previous versions of Python, / meant floor division. This is no longer the case in Python 3
In [ ]:
# Exponentiation. Limited precision though!
16**0.5
In [ ]:
# Residue class modulo n
5%2
In [ ]:
# Guess the output before executing this cell. Come on, don't cheat!
6%(1+3)
It is interesting to note that the % operator is not distributive.
In general, one does not have to declare variables in python before using it. We merely need to assign numbers to variables. In the computer, this means that a certain place in memory has been allocated to store that particular number. Assignment to variables is executed by the = operator. The equal sign in Python is the binary comparison == operator.
Python is case sensitive. So a variable name A is different from a. Variables cannot begin with numbers and cannot have empty spaces between them. So my variable is not a valid variable. Usually what is done is to write my_variable
After assigning numbers to variables, the variable can be used to represent the number in any arithmetic operation.
In [ ]:
# Assignment
x=1
y=2
In [ ]:
x+y
In [ ]:
x/y
Notice that after assignment, I can access the variables in a different cell. However, if you reassign a variable to a different number, the old values for that variable are overwritten.
In [ ]:
x=5
x+y-2
Now try clicking back to the cell x+y and re-executing it. What do you the answer will be?
Even though that cell was above our reassignment cell, nevertheless re-executing that cell means executing that block of code that the latest values for that variable. It is for this reason that one must be very careful with the order of execution of code blocks. In order to help us keep track of the order of execution, each cell has a counter next to it. Notice the In [n]. Higher values of n indicates more recent executions.
Variables can also be reassigned
In [ ]:
# For example
x = x+1
print(x)
So what happened here? Well, if we recall x originally was assigned 5. Therefore x+1 would give us 6. This value is then reassigned to the exact same location in memory represented by the variable x. So now that piece of memory contains the value 6. We then use the print function to display the content of x.
As this is a often used pattern, Python has a convenience syntax for this kind assignment
In [ ]:
# reset x to 5
x=5
x += 1
print(x)
In [ ]:
x = 5
#What do you think the values of x will be for x -= 1, x *= 2 or x /= 2?
# Test it out in the space below
print(x)
In [ ]:
0.1+0.2
The following exerpt from the Python documentation explains what is happening quite clearly. To be fair, even our decimal system is inadequate to represent rational numbers like 1/3, 1/11 and so on.
Strings are basically text. These are enclosed in ' ' or " ". The reason for having two ways of denoting strings is because we may need to nest a string within a string like in 'The quick brown fox "jumped" over the lazy old dog'. This is especially useful when setting up database queries and the like.
In [ ]:
# Noting the difference between printing quoted variables (strings) and printing the variable itself.
x = 5
print(x)
print('x')
In [ ]:
my_name = 'Tang U-Liang'
print(my_name)
In [ ]:
# String formatting: Using the %
age = 35
print('Hello doctor, my name is %s. I am %d years old. I weigh %.1f kg' % (my_name, age, 70.25))
# or using .format method
print("Hi, I'm {name}. Please register {name} for this conference".format(name=my_name))
When using % to indicate string substitution, take note of the common formatting "placeholders"
%s to substitue strings. %d for printing integer substitutions%.1f means to print a floating point number up to 1 decimal place. Note that there is no roundingThe utility of the .format method arises when the same string needs to printed in various places in a larger body of text. This avoids duplicating code. Also did you notice I used double quotation. Why?
More about string formats can be found in this excellent blog post
In [ ]:
fruit = 'Apple'
drink = 'juice'
print(fruit+drink) # concatenation
In [ ]:
#Don't like the lack of spacing between words?
print(fruit+' '+drink)
Use [] to access specific letters in the string. Python uses 0 indexing. So the first letter is accessed by my_string[0] while my_string[1] accesses the second letter.
In [ ]:
print(fruit[0])
print(fruit[1])
Slicing is a way of get specific subsets of the string. If you let $x_n$ denote the $n+1$-th letter (note zero indexing) in a string (and by letter this includes whitespace characters as well!) then writing my_string[i:j] returns a subset $$x_i, x_{i+1}, \ldots, x_{j-1}$$ of letters in a string. That means the slice [i:j] takes all subsets of letters starting from index i and stops one index before the index indicated by j.
0 indexing and stopping point convention frequently trips up first time users. So take special note of this convention. 0 indexing is used throughout Python especially in matplotlib and pandas.
In [ ]:
favourite_drink = fruit+' '+drink
print("Printing the first to 3rd letter.")
print(favourite_drink[0:3])
print("\nNow I want to print the second to seventh letter:")
print(favourite_drink[1:7])
Notice the use of \n in the second print function. This is called a newline character which does exactly what its name says. Also in the third print function notice the seperation between e and j. It is actually not seperated. The sixth letter is a whitespace character ' '.
Slicing also utilizes arithmetic progressions to return even more specific subsets of strings. So [i:j:k] means that the slice will return $$ x_{i}, x_{i+k}, x_{i+2k}, \ldots, x_{i+mk}$$ where $m$ is the largest (resp. smallest) integer such that $i+mk \leq j-1$ (resp $1+mk \geq j+1$ if $i\geq j$)
In [ ]:
print(favourite_drink[0:7:2])
In [ ]:
# Here's a trick, try this out
print(favourite_drink[3:0:-1])
So what happened above? Well [3:0:-1] means that starting from the 4-th letter $x_3$ which is 'l' return a subtring including $x_{2}, x_{1}$ as well. Note that the progression does not include $x_0 =$ 'A' because the stopping point is non-inclusive of j.
The slice [:j] or [i:] means take substrings starting from the beginning up to the $j$-th letter (i.e. the $x_{j-1}$ letter) and substring starting from the $i+1$-th (i.e. the $x_{i}$) letter to the end of the string.
Print the string favourite_drink in reverse order. How would you do it?
In [ ]:
# Write your answer here and check it with the output below
In [ ]:
x = 5.0
type(x)
In [ ]:
type(favourite_drink)
In [ ]:
type(True)
In [ ]:
type(500)
list, here's where the magic beginslist are the fundamental data structure in Python. These are analogous to arrays in C or Java. If you use R, lists are analogous to vectors (and not R list)
Declaring a list is as simple as using square brackets [ ] to enclose a list of objects (or variables) seperated by commas.
In [ ]:
# Here's a list called staff containing his name, his age and current renumeration
staff = ['Andy', 28, 980.15]
In [ ]:
len(staff)
Perhaps you want to recover that staff's name. It's in the first position of the list.
In [ ]:
staff[0]
Notice that Python still outputs to console even though we did not use the print function. Actually the print function prints a particularly "nice" string representation of the object, which is why Andy is printed without the quotation marks if print was used.
Can you find me Andy's age now?
In [ ]:
# type your answer here and run the cell
The same slicing rules for strings apply to lists as well. If we wanted Andy's age and wage, we would type staff[1:3]
In [ ]:
staff[1:3]
In [ ]:
nested_list = ['apples', 'banana', [1.50, 0.40]]
Notice that if I type nested_list[2], Python will return me the list [1.50, .40]. This can be accessed again using indexing (or slicing notation) [ ].
In [ ]:
# Accesing items from within a nested list structure.
print(nested_list[2])
# Assigning nested_list[2] to a variable. The variable price represents a list
price = nested_list[2]
print(type(price))
# Getting the smaller of the two floats
print(nested_list[2][1])
Right now, let us look at four very useful list methods. Methods are basically operations which modify lists. These are:
pop which allows us to remove an item in a list.
So for example if $x_0, x_1, \ldots, x_n$ are items in a list, calling my_list.pop(r) will modify the list so that it contains only $$x_0, \ldots, x_{r-1}, x_{r+1},\ldots, x_n$$ while returning the element $x_r$.
append which adds items to the end of the list.
Let's say $x_{n+1}$ is the new object you wish to append to the end of the list. Calling the method my_list.append(x_n+1) will modify the list inplace so that the list will now contain $$x_0, \ldots, x_n, x_{n+1}$$ Note that append does not return any output!
insert which as the name suggests, allows us to add items to a list in a particular index location
When using this, type my_list.insert(r, x_{n+1}) with the second argument to the method the object you wish to insert and r the position (still 0 indexed) where this object ought to go in that list. This method modifies the list inplace and does not return any output. After calling the insert method, the list now contains $$x_0,\ldots, x_{r-1}, x_{n+1}, x_{r}, \ldots, x_n$$ This means that my_list[r] = $x_{n+1}$ while my_list[r+1] = $x_{r}$
+ is used to concatenate two lists. If you have two lists and want to join them together producing a union of two (or more lists), use this binary operator.
This works by returning a union of two lists. So $$[ x_1,\ldots, x_n] + [y_1,\ldots, y_m]$$ is the list containing $$ x_1,\ldots, x_n,y_1, \ldots, y_m$$ This change is not permanent unless you assign the result of the operation to another variable.
In [ ]:
# append
staff.append('Finance')
print(staff)
In [ ]:
# pop away the information about his salary
andys_salary = staff.pop(2)
print(andys_salary)
print(staff)
In [ ]:
# oops, made a mistake, I want to reinsert information about his salary
staff.insert(3, andys_salary)
print(staff)
In [ ]:
contacts = [99993535, "andy@company.com"]
staff = staff+contacts # reassignment of the concatenated list back to staff
print(staff)
In [ ]:
staff = ['Andy', 28, 'Finance', 980.15, 99993535, 'andy@company.com']
staff
In [ ]:
# type your answer here
print(staff)
Answer: ['Andy', 'andy@company.com', 28, 'Finance', 980.15, 99993535]
Obviously there are much, much more that can be said about lists. But we have to move on. In the next unit, we will learn how to control program flow with for and if and a new data structure called dictionaries.
In [ ]: